Translation, cultural adaptation and assessment of psychometrics properties of the Extended Version of the Nordic Musculoskeletal Questionnaire (NMQ-E) in Persian language speaking people

Background To translate and cross-culturally adapt the Extended Version of the Nordic Musculoskeletal Questionnaire (NMQ-E) into Persian (NMQ-E-P) and evaluate the psychometric properties in a general population with different occupational tasks across nine body regions. Methods This cross-sectional study was designed according to the standard guidelines and the COSMIN checklist. The NMQ-E-P was achieved through forward and backward translation methods and consensus to produce the final draft. A Persian-speaking population (n = 571, age 38.24 ± 7.65 years, female = 46.2%) was recruited from industries and office workers with three occupational task inclusion criteria: assembly, office, and lifting. Psychometric properties included validity for face (from confirmed clarity, simplicity, and readability), content (via the content validity index); and construct (through known group validity); additionally, the properties of internal consistency (Cronbach’s α); and test-retest reliability (Kappa coefficient of agreement) were considered. Results No significant issues during the translation process were found. The NMQ-E-P showed adequate internal consistency for all regions (α ≥ 0.87). The test-retest reliability was examined with Kappa agreement correlation coefficient and all items, except ankle regions, showed very good agreements (Kappa coefficient = 0.87-1.0). Excellent ICC values were obtained for quantitative variables (ICC > 0.88) and good construct validity was revealed (p < 0.001). Conclusion The Persian version of the NMQ-E has very good validity and reliability and can be used by researchers and professionals to evaluate the prevalence of MSDs in nine body regions simultaneously.

Translation, cultural adaptation and assessment of psychometrics properties of the Extended Version of the Nordic Musculoskeletal Questionnaire (NMQ-E) in Persian language speaking people Background Work-related musculoskeletal disorders (WMSDs) include diseases that are characterized by pain, ache, disability, and impairment in soft tissues.These tissues include muscles, nerves, bones, tendons, and cartilage, and impairment may be caused or aggravated by work [1].In general, it is reported that musculoskeletal disorders account for 20-30% of work absenteeism days in different countries, which is a relatively high and concerning statistic.Heavy burdens are placed on societies because of these work absenteeism costs that are a consequence of such WMDs [2,3].In Iran, a prevalence rate range of 38.1-50% for the upper and lower back regions is reported respectively [4].
Due to the multifactorial etiology of MSDs such as physical, psychosocial, personal and organizational [5,6], several body regions are usually exposed to the injury simultaneously.Several body sites and work-related site-specific disorders such as the back, neck, shoulders, hands, wrists, knee, hips and nonspecific neck and low back pain, osteoarthritis, epicondylitis, nerve entrapment, plantar fasciitis are reported in the workplace [7][8][9].Because of the limitation of the previous instruments to gather greater data regarding the prevalence of musculoskeletal pain in various regions of the body, the need for new reliable and valid instruments with the ability to simultaneously generate data from multi-body regions is evident.In contrast, WMSDs should be differentiated and screened appropriately from other disorders caused by diseases such as fibromyalgia or arthritis [10] or by falls, slips, trips, or similar incidents [11].
There are a variety of instruments to screen or predict MSDs in general and occupational settings.Observational methods are the most common way for practitioners to predict and identify physical exposures within a workplace [12].There are some limitations to observational methods which include low reliability [13], based on concepts of an external observer [14], solely for risk assessment usage, time-consuming [15], and the need for a combination of several observational methods to assess the risk [16].
Self-report "questionnaires" are the second method most commonly used by ergonomists to screen or predict MSDs.These are straightforward to use, suitable for a large number of participants, and are low in time demand, as well as being applicable to a wide range of tasks at a low cost [17].One of the most common and widely used tools in this regard is the general Standardized Nordic Musculoskeletal Questionnaire (NMQ).The NMQ was presented by Kuorinka et al. and used extensively to quantify musculoskeletal pain and related activity limitations [18].However, though the NMQ was initially introduced over 20 years ago, its psychometric properties have been updated and evaluated over recent years [19].Some psychometric properties, such as reliability and validity, were evaluated in different languages such as Portuguese [20], Italian [21], Greek [22] Turkish [23], and Iranian [24].The original questionnaire was developed for occupational settings but recent psychometrics evaluations were performed on specific groups such as the patients or workers rather than general populations or mixed populations [25].
Dawson et al. provided a new version of the NMQ in English, the extended NMQ (NMQ-E) that is applicable for nine body regions and able to extract data about the prevalence, severity and consequences of MSDs on the daily life of the individual [19].The NMQ-E has the ability to measure the point, annual, and lifetime prevalence of musculoskeletal symptoms [19].It is a one-page user-friendly questionnaire that is completed in a short time for these nine body regions.To measure the same parameters by a specific outcome measure in different cultures and languages, cross-cultural adaptation must be performed [26].Although there is a need for such a tool that can simultaneously evaluate several body areas, the translation and cultural adaptation to the Persian language have yet to be performed.In contrast, several studies [27][28][29] have already used the NMQ-E version in Persian, but the psychometric evaluation of the Persian NMQ-E version has not been completed using a rigorous scientific methodology.Therefore, this study aimed to translate and cross-culturally adapt the NMQ-E for use in Persian, and to determine the psychometric properties of the Persian translated version (NMQ-E-P) which included evaluation of reliability (test-retest reliability and internal consistency) and validity (face, content, construct and convergent validity) in a general population with different occupational tasks.

Study design
We conducted the methods and results analysis of this cross-sectional study according to the standard published papers for outcome measurement instruments for the workplace setting [30,31] and the COSMIN (COnsensus-based Standards for the selection of health status Measurement INstruments) criteria [32].The consent from the developer of the original NMQ-E, Ana Dawson was obtained.The ethical committee of the University of Social Welfare and Rehabilitation Sciences approved the study (IR.USWR.REC.1401.260).Informed written consent was obtained from all participants.A 2-step approach was used in this study that included: (1) translation and cross-cultural adaptation, and (2) psychometrics evaluation within a general population group with different occupational tasks.

Participants
The participants were recruited between 2022 and 2023 from different industries and offices to select occupations that included lifting tasks, office working tasks, and montage assembling tasks.According to the published guidelines [33], a subject to item ratio method is used for sample size determination in psychometric validation studies.It was shown that around 92% of the articles reported a subject to item ratio of ≥ 2, whereas 25% had a ratio of ≥ 20 [34].We selected a ratio of 5 per item with a 15% dropout as our ratio precedent to ensure a minimum sample size.Consequently, with n = 571 (99 items of NMQ-E ) [18,35] we exceeded the required minimum of n = 495.A final sample of convenience (n = 571) was used and participants were placed into one of the three groups: lifting (n = 120), office workers (n = 260) and montage (n = 211).The general mean age was 38.24 ± 7.65, and BMI of 24.47 ± 2.90, with a female 46.2%.
The eligible criteria were: (1) aged 20-55 years; (2) employed in the same job for the preceding 12 months; (3) willingness to participate in the study; and (4) ability to read, write and understand the questions in the Persian language.

The Extended Version of the Nordic Musculoskeletal Questionnaire (NMQ-E)
The E-NMQ was developed originally by Anna P. Dawson et al.The NMQ-E [18] consists of 99 questions in a dichotomous yes/no response option that provides information on the onset, prevalence and consequences of musculoskeletal pain in nine separate body regions (the neck, shoulder, upper back, elbow, wrist/hand, low back, hip/thigh, knee, ankle/foot) [35].The questions are respectively ordered about the respondents' lifetime, prevalence, and consequences of pain.Questions for a body region should be responded to horizontally, before progressing to the next body region in the provision of the new raw data.The questionnaire is completed in a short time requiring approximately 10 to 15 min [25].

National Aeronautics and Space Administration Task load index (NASA-TLX)
The NASA-TLX is a multi-dimensional self-report scale that is designed to estimate the mental workload in occupational settings.Six subscales of physical demand, mental demand, temporal demand, effort, performance, and frustration are incorporated [36].Each dimension is rated for the level of demand on a 7-point scale.Increments of high, medium, and low estimates for each point result in 21 graduations on the scales.To discriminate the mental workload between the participants, the total workload score was compared between the three subgroups.Further, the physical subscale was used as an effective and straightforward approach to detect the physical workload in the three different groups as a consequence of the high correlations reported between the weighted overall score and the Raw-TLX indices [37,38].The NASA-TLX is used extensively in Persian studies [39] and its Persian version was used in the current study [40].The two NASA_TLX subscales, physical and mental, were used to screen the level of load in a simple way to confirm the difference between the three groups.

Translation and cross-cultural adaptation of the NMQ-E
Translation and cultural adaption processes were conducted according to the accepted guidelines in five steps including forward translation, synthesis, backward translation, consolidation, and pre-final step testing [41].Two independent native bilingual Persian translators performed the forward translation.One translator (T1) had a PhD in physiotherapy with experience in outcome measures translation and the second translator was blinded to the research circumstances (T2).A brief explanation of the tool's use, target population, and the purpose of the translation was provided to the translators to increase the final product quality.The first translator, with the help of one of the researchers, synthesized a general version of the NMQ-E.Two additional bilingual translators, blinded to the original version, back-translated the synthesized version into English (T3, T4).An expert committee composed of an occupational therapist, a physiotherapist, two ergonomists, an occupational medicine specialist, four translators, and a methodologist reviewed all translations, the consensus version, and the original questionnaire.Following the discussion on the semantic, conceptual equivalence and idiomatic discrepancies, consensus on a pre-final NMQ-E was obtained.The pre-final version was sent to the developer of the questionnaire (A.Dawson) to confirm its equivalency with the original version.
In the pilot stage, to determine the understandability, simplicity, clarity and readability of the questionnaire by its user, face validity was conducted as a subjective measure through the interview process [42].Fifteen participants from the same working environment were instructed to read and complete the questionnaire under the supervision of one of the researchers.Any difficulty understanding or ambiguity of phrases or words was requested to be highlighted.Participants were also asked for possible suggestions and alternatives for ambiguous items.The consensus on the clarity of language, simplicity, and readability was obtained through a focus group session and the final Persian version of the NMQ-E was created for the process of psychometric evaluation.

Psychometric assessment of the NMQ-E
Face validity Through a qualitative analysis of the comments that were provided by the 15 subjects in the pilot study, the face validity of the Persian NMQ-E was obtained.
Content validity Content validity depends on the extent to which an empirical measure reflects a specific domain of content.Initially, the Content Validity Index at item level (I-CVI) was determined through the proportion of expert agreement on the individual items.The average of the I-CVIs was calculated and considered as the average scale-level-CVI (S-CVI/Ave) [43].To confirm the content validity eight experts from different disciplines including a physiotherapist, ergonomist, occupational therapist and orthopedic surgeon, evaluated and rated the content relevancy on a 4-point Likert scale (1 = not relevant, 4 = very relevant).The proportion of experts that gave a rating of 3-4 was considered as I-CVIs and S-CVI [44,45].An acceptable cut-off value for the eight experts is reported about 78% [44].

Construct validity
Construct validity was tested verifying two hypotheses.The first is that the physical workload is different between the three occupational subgroups (known group validity).Hence, the participants with higher physical workload reported a greater number of body regions being involved during the previous month or year.The second hypothesis was the existence of a correlation between the number of involved body regions during the preceding month with the number of visits to doctors and the amount of medication taken by the individual.
Reliability Two concepts of test-retest reliability and internal consistency were considered for reliability evaluation.The same questionnaire was completed twice by 110 participants (mean age 28.40 ± 7.32 years) who were selected randomly from the sample of 571 participants.Repeated measures were made at an interval of 1-3 days during which the participant's condition remained stable.The NMQ-E completion was performed within the individual's workplace at the same time and same place.Further, Cohen's unweighted Kappa was used for dichotomous variables within the test-retest reliability.Kappa values were categorized as poor (0 to 0.2), fair (0.21 to 0.40), moderate (0.41 to 0.60), good (0.61 to 0.80), and very good (0.81 to 1.00) [46].The intraclass correlation coefficient (ICC 2,1 ) was used to determine the test-retest reliability of continuous variables.An ICC 2,1 value > 0.70 indicated excellent reliability [47].
All statistical analyses were calculated using the statistical package for Social Science version 22 (SPSS 22) for Windows.Statistical significance was set at P < 0.05.

Phase-I: translation and cross-cultural adaptation
There were no significant issues while translating the NMQ-E into Persian.The chiropractic discipline as an academic course is not available to the Iranian general population and consequently, they are not familiar with it, hence it was removed from the questionnaire.During the face validity, the participants generally stated that the instructions of the questionnaire were simple to understand and complete.Participants could complete the tool quickly without difficulty.Therefore, the final version in Persian was obtained without significant changes to the original version.

Participants
Six hundred subjects were invited to participate in the study voluntarily.The information data of 29 participants was not complete and they were excluded from the study.Among the remaining 571 participants, the rate of acceptability was calculated by the proportion of missing item responses, with high overall completion rates (≥ 90%) taken as evidence of questionnaire acceptability.The average rate of completion for all items was above 94% indicating the acceptability of the questionnaire.The mean ± SD of age and Body Mass Index (BMI) of the participants were 38.24 ± 7.65 years and 24.47 ± 2.90 kg/ m2, respectively.A total of n = 381 (53.8%) of respondents were male.The demographic characteristics of the participants (n = 571), in general, and according to three different occupational tasks are presented in Table 1.The characters of categorical variables are presented as frequency (n) and percentage (%), while the continuous variables as mean and standard deviation (SD).
The most common body region with the highest lifetime prevalence of MSDs was the low back (n = 671, 58.7%) for the general population.Further, the lifetime prevalence in three different groups of office workers, montage workers and lifting workers were reported as 68.8%, 44.1%, and 62.5%, respectively.The body area reported with the most pain during the preceding year and month was related to the low back with 48.7%, and 48.9%, respectively.In the point prevalence category, the neck was the highest prevalent pain-experiencing region (27.9%).The results of MSDs prevalence for general and separated occupational tasks are provided in Tables 2 and  3, respectively.

Reliability
The results of the ICC 2.1 values for the age of onset of the 'trouble' question, showed excellent test-retest reliability (ICC > 0.88) with the lowest values being reported for the upper back (0.88) and the highest values for the wrist and hips (0.98).The Kappa agreement correlation coefficient for the remaining questions, including the prevalence questions and severity questions, are presented in Tables 4 and 5.For all body regions, except the ankle region, very good agreements were obtained.For some variables, it was not possible to compute the agreement correlation coefficient test, because the same and constant answers were given in two conditions.The internal consistency of all regions exceeded 0.70 indicating an acceptable satisfactory internal consistency reliability as available in Table 4.

Content validity
All eight experts endorsed the relevancy and validity of the NMQ-E-P items (I-CVI = 1.00;S-CVI-Ave = 1.00).

Construct validity
A one-way ANOVA was performed to compare the effect of groups on the mental and physical demand variables.A one-way ANOVA revealed that there was a statistically significant difference in the physical workload variable between at least two groups (F(2, 566) = [172.9]p,p < 0.001).No significant difference was shown for the mental workload between groups (F(2, 566) = [4.05],p = 0.1).Tukey's HSD Test for multiple comparisons found that the mean value of physical workload was significantly different between each paired group (Table 6).
Our hypothesis was that, the number of body regions involved in the last year and month were greater in the groups with higher physical workload.A one-way ANOVA result showed a statistically significant difference in the number of involved body regions during the last year (F(2, 586) = [3.37],p = 0.035) and month (F(2, 575) = [4.91],p = 0.008) between at least two groups.Accordingly, Tukey's HSD Test for multiple comparisons found that the mean value for the number of involved regions was significantly different between each paired group (Table 6).
The results of the second hypothesis analysis showed that there was a significant correlation between the number of involved body regions and the number of visits to the doctor (r = 0.52, CI = 0.45-0.60,P < 0.001) and the amount of medication taken (r = 0.46, CI 95 = 0.39-0.54,P < 0.001) in general sample.

Discussion
The purpose of the current study was to develop a screening tool for musculoskeletal symptoms and the assessment of severity in Persian language-speaking people.Further, the related psychometric properties, including reliability and validity, were appropriately evaluated using the COSMIN standards.The NMQ-E is extracted from the NMQ and, because of its user-friendly and comprehensive format, has been translated into different languages in different formats, both online and paper-based, for use in varied cultural and linguistic settings [25,35,48].During the process of translation, no significant difficulties were found except for the chiropractic phrase that was removed as this specialization does not have an academic center in Iran and is not a familiar term for Iranian Persian-speaking people.This change and revision parallels that which was applied in the Turkish version for the same cultural reasons [25].
The minimum values of the Kappa Coefficient for lifetime, annual, month and day-time prevalence were respectively 0.89, 0.92, 0.93, and 0.94, which indicated very good questionnaire stability.It appears that 'memory decay' , a recognized and normal phenomenon in daily life, is only a minor contributing factor to the results; this is demonstrated by the test-retest interval time being very short (1-3 days) during a period where the participants' health status was unchanged.This 1-3 day test-retest interval was selected as the subjects' health condition should not have changed.In contrast Dawson et al. [19] used a time interval of 24 h and Pugh used 4 to 7 days [35].
In comparison with previous research, our reliability results demonstrated relatively lower Kappa coefficient values in comparison with the original [19].In contrast to both the original and the Hebrew version, in this study, 'lifetime prevalence' demonstrated the lowest whereas point prevalence displayed the highest values [19,48].This could be due to cultural variation of the difference in the retest period, factors that will need to be considered in future research.
The NMQ-E internal consistency was not examined by Dawson et al., whereas in this study, the alpha range of 0.87-0.95was obtained, which highlighted the relationship between items without the presence of item-redundancy.In line with our study, alpha values > 0.78 and ICC values > 0.88 were also reported in the Turkish version [25].This is in contrast to the Hebrew version where lower internal consistency was reported [48], indicating no relationship was present between items.In a modified NMQ-E version in a nursing population, high internal consistency was reported for each region and the related subscales of Severity of symptoms and Impact on activities [35].Generally, the values of internal consistency should be considered cautiously, and are probably consequences of chance, because there is no logical relationship in this setting for a consistency between knee pain and neck pain.The ICC value for the age of onset of the trouble item was very similar to the original version [19] and the modified NMQ-E [35].It seems that the online NMQ-E version may have sufficient psychometric properties for health professional use.
In eight items that are consequences of pain, we cannot compute the Kappa coefficient because of the total  negative responses, a finding that was also present in the original version [19].
To the best of our knowledge, there are only two other NMQ-E versions, the Turkish [25] and the online Hebrew [48].Additionally, a modified NMQ-E online version was provided to measure nurses' fitness [35].In this study, the content and construct validity were respectively evaluated using CVI and hypothesis testing.With the current study findings, there is consistency with the Turkish [25] and modified NMQ-E [35] versions, while the content validity was evaluated and determined as adequate through the use of the CVI.
To evaluate the construct validity hypothesis testing was used, as there appears to be no available tool that can simultaneously examine several body regions in terms of the prevalence and severity of symptoms.In the Turkish version, the Cornell Musculoskeletal Discomfort Questionnaire (CMDQ) was used to assess construct validity and a correlation between the two questionnaires was noted and confirmed [25].However, this CMDQ criterion and the comparative methodology were not selected in this study as the validity of such a comparison is questioned due to the binary response option (yes or no).As a consequence, a correlation analysis between the items is not statistically sound and should not be made.Further, all body regions evaluated in the NMQ-E are not equally represented or present in the CMDQ, so a direct comparison by pairs is not possible.A final limitation in using the CMDQ questionnaire is the lack of a standardized Persian version.Consequently, the evaluation of the construct validity was determined through hypothesis testing.One of our hypotheses was that the subjects with higher physical workload were more likely to have a greater number of involved body regions.This can be ascertained through the use of ANOVA statistical analysis which demonstrated that physical workload was significantly different among the three groups in the following order: lifting tasks > montage tasks > office work tasks.Accordingly, we confirmed our expectation related to the difference of involved body regions in the different groups.Further, we anticipated that patients with higher physical workloads, and consequently a higher number of involved body regions, would likely have a greater incidence of doctor visits and require more medicine.This was demonstrated through the positive correlation.
The NMQ-E is originally extracted from the NMQ to gather more data on the prevalence rate and impacts of musculoskeletal pain on daily activity.Therefore, the psychometrics properties of the NMQ are comparable to some sections of the NMQ-E.The NMQ has been translated into Chinese [49], Turkish [50], Persian [51], Brazilian Portuguese [52] and Greek [53].The reported Kappa coefficients in these studies were mostly between 0.63  and 1.0 indicating good reliability, which is similar to our findings in the related sections.This study was conducted to develop the NMQ-E-P and then assess the psychometric properties.In general, the reliability statistics are adequately strong and sufficient for use as a screening instrument in research and epidemiological studies of Persian subjects.

Limitation and strength
Several limitations in this study are noted.The distribution of participants in each group was not equal due simply to the random sampling methodology.Another limitation was that no criterion or similar Persian NMQ-E instrument was available for criterion validity evaluation.An interval of 1-3 days for test-retest reliability evaluation was another limitation of the study.An interval of more than three-days will increase the chance of changing the subjects' health condition.Further, the   dichotomous nature of the NMQ-E items limits the factor analysis of the instrument as the responses of the items are independent, without a defined total score.One of the study's strengths was that the questionnaire was administered to a combined population with a relatively high prevalence of musculoskeletal disorders.Consequently, the generalizability of the results is higher than that for other language versions.Additionally, the evaluation of content and construct validity provides further support for this questionnaire and strengthens the current study.

Conclusion
The Persian version of the NMQ-E was shown to have very good validity and reliability in a sample of Iranian Persian-speaking workers from both office and industry settings.The NMQ-E-P was shown to have sound characteristics and application for the industrial and office settings it is intended.The NMQ-E-P can be used by researchers and professionals to evaluate the prevalence of MSDs in nine body regions simultaneously.Further research is required in prospective populations to verify these psychometric findings and to determine the relevance of change and any other variables that can be altered over time.

Table 1
Demographic descriptive characters of general and three group participants

Table 2
Prevalence of MSDs In General participants in percentage (%)

Table 3
Prevalence of MSDs In three groups of participants in percentage (%)

Table 4
Test-retest reliability of NMQ-E in an combined occupational settings (n = 110), Age of Onset and Prevalence Questions and internal consistency

Table 5
The Kappa coefficient and Confidence Interval values indicating test-retest reliability of NMQ-E in a combined occupational settings (n = 110): Questions about Consequences of Pain * Cannot compute the Kappa coefficient

Table 6
Multiple Comparisons Bonferroni test in three groups for dependent variables .The mean difference is significant at the 0.05 level *